Data Science Startups funded by Y Combinator (YC) 2026

May 2026

Browse 24 of the top Data Science startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 5,000 companies.

Lotas
S2025
• Active • 2 employees • San Francisco, CA, USA
For our first product, we built Rao: an AI coding assistant into RStudio — an IDE used by up to 5 million data scientists and statisticians who use the R programming language to analyze data. Rao could read, write, and edit code while understanding the user's context (codebase and environmental variables).
data-science
developer-tools
data-visualization
artificial-intelligence

Fleetline
S2025
• Active • 7 employees • New York, NY, USA
Fleetline is building the first complete-context algorithmic load planner for mid to large-sized trucking fleets. Today’s dispatchers face an overwhelming task: balancing regulations, customer demands, live fleet data, and individual driver needs, all while coordinating across siloed tools and teams. Mistakes run rampant, and each mistake is thousands of dollars. Even the select fleets using algorithms that have fleet-wide context struggle: the outputs are rigid, hard to interpret, and are blind to real-world nuances like driver preferences and schedule exceptions. Fleetline solves this by combining advanced optimization with LLMs that can easily adapt algorithms to each fleet’s needs and capture driver-specific information. The result is smarter planning truly optimized for every fleet and its drivers.
data-science
artificial-intelligence
logistics
supply-chain

Percival
P2025
• Active • 3 employees • San Francisco, CA, USA
Building an AI copilot for researchers to analyze and transform real-world data.
artificial-intelligence
data-science
developer-tools

Plexe
P2025
• Active • 2 employees • London
Plexe builds predictive ML models from a problem description. It connects to data sources, conducts experiments, evaluates and deploys the models to an API endpoint.
ai
machine-learning
data-science

Klavis AI
P2025
• Active • 3 employees
Powering frontier AI labs with real world MCP environments and complex, long-horizon agentic tool-use data.
reinforcement-learning
data-science
ai
artificial-intelligence

Sherpa Labs
W2025
• Active • 2 employees • New York City
Sherpa Labs is building an agentic data team that automates modeling, operations, and discovery via a swarm of agents. Their initial product is a data catalog that enables developers and AI agents to quickly locate the right data sources, effortlessly trace data lineage, and understand messy data lakes — similar to Glean but for enterprise data systems.
data-engineering
data-science
b2b
saas
ai

VortexifyAI
F2024
• Active • 3 employees • New York, NY, USA
Vortexify AI is a platform for building fully operational AI workflows tailored to supply chain operations. Deploy specialized, task-specific AI bots in days — not weeks. Our platform streamlines the creation of AI bots that can analyze millions of data rows, manage complex, long-horizon processes, and collaborate seamlessly with humans in the loop. Each AI bot comes equipped with custom tools maintained through AI-powered code editors directly within the Vortexify platform. Development teams can instantly generate dashboards, data pipelines, machine learning models, and custom functions — all contextualized to business goals and data requirements. Bots can operate in Co-pilot or Agent (autonomous) mode. They can be scheduled or triggered by real-time alerts and are governed by robust, natural language-generated guardrail templates that ensure safety, compliance, and reliability.
iot
data-science
ai-assistant
supply-chain
ai

Zoa Research
S2024
• Active • 5 employees • New York, NY, USA
Historically, quantitative models are domain specific. Brilliant people spend their best years testing features, tuning hyperparameters, and iterating architectures within a narrow domain. But scale is the panacea: large models will find patterns people, and specialized models, could not. Forecasting generalizes. Zoa trains cross-domain event forecasting engines. *Automating Iteration* LLMs—embedded in multi-agent optimization loops and evaluated against fixed policies—can automate the build-test-improve modeling cycle. Think AlphaEvolve for forecasting problems. *Sample-Efficient General Models* Today’s forecasting models are narrowly crafted with deep human priors. But larger models will outperform state-of-the-art specialized models. Unlike existing event models, our models leverage data from across contexts and rely less on human intuition. And compared to LLMs, our models are built with more inductive priors and rely more heavily on inference-time compute—improving sample efficiency. *Why It Matters* In the real economy, our models could be useful for forecasting supply chain volatility, energy supply and demand, even earthquake risk. Science is, Ian Hacking writes, the taming of chance. It is the process of iteratively updating priors (something like: identify uncertainty, conceive experiment to reduce uncertainty, execute, update). If science is uncertainty-reduction, forecasting is a critical measure of progress. Better forecasting improves our ability to select interesting experiments (roughly those with greatest expected uncertainty reduction) and update priors. Our models will be used by labs and academics in data-heavy domains. Sam's ex-girlfriend introduced him to Greg back at Carnegie Mellon in 2017, and while that relationship didn't last, their friendship has. After college, Greg went to Harvard Law School, while Sam worked for three years at Jane Street on their Options desk, building & leading a satellite dev team.
ai
data-science

Overstand Labs
W2025
• Active • 4 employees • New York City
Overstand is a data lab that allows our customers to navigate any set of data in just a few minutes. *Enterprise*: For enterprises, we we unify Slack, email, calls, and operational data, then surface the signals that matter — customer needs, risks, and revenue opportunities hidden in everyday conversations. *Legal Firms*: For legal firms, we help them really quickly understand their entire discovery corpus (either before, or after document review), and quickly build out an initial case assessment and facts. Instead of waiting on reports or manual analysis, teams get immediate, evidence-backed answers from the data they already have. Overstand delivers clarity and leverage as your business scales.
data-science
data-engineering
conversational-ai
artificial-intelligence
legaltech

Thunder Compute
S2024
• Active • 4 employees • San Francisco, CA, USA
One-click GPU instances with persistent storage, snapshots, and hot-swappable hardware, with the lowest prices anywhere.
infrastructure
cloud-computing
developer-tools
data-science
artificial-intelligence

MinusX
S2024
• Active • 2 employees • San Francisco, CA, USA
MinusX is a chrome extension that adds a side chat to your analytics apps (Jupyter, Metabase, Grafana, Tableau, etc). Given an instruction, our agent operates your apps - by clicking & typing, just like you do - to analyze data and answer queries. We believe an AI Data Scientist is a scientist, not yet-another-new-analytics-platform. MinusX interoperates with you in tools you already love and use, and as a matter of philosophy, gets out of the way.
ai-assistant
analytics
data-science
machine-learning
ai

Mica AI
S2024
• Active • 3 employees • San Francisco, CA, USA
Mica's AI agents replace the data ops teams fixing bad data. When bad or missing data breaks the pipeline, and orchestration, retries, and monitoring fail, painful manual review work kicks in, pulling humans in to investigate and patch data issues across systems. Mica does what those humans do: gathering the right information from internal docs and external systems, reasoning across context, and resolving errors autonomously to get the pipeline moving again. The result: dramatically reduce time, cost, and operational drag as your data pipelines scale without scaling ops headcount. Mica turns judgment-heavy data fixes from a manual bottleneck into an automated background process with full auditability.
data-engineering
data-science
enterprise-software

Metofico
W2024
• Active • 2 employees • London, UK
Metofico provides a no-code data analysis tool tailored for the life sciences. Our platform enables life scientists to analyse complex/massive datasets and extract necessary insights without needing advanced programming skills. This accessibility helps both researchers new to data science and experts save months of work. Metofico aims to be the leading centralized platform for data analysis in life science research, covering a wide range of applications from brain activity analysis (like photometry and EEG) to AI-powered detection and tracking of research animals. Our vision is to accelerate research processes and enhance the quality of research outputs across the board. By streamlining complex data analysis and making it more accessible, we’re committed to driving forward scientific discoveries and innovation.
saas
no-code
data-science
data-visualization

Preloop
W2024
• Active • 2 employees • Seattle
Only 2 out of 10 ML models make it from experiment to production. Preloop helps automate the process of deployment, helping companies realize more value from their machine learning teams, while focusing teams' attention on science instead of engineering.
artificial-intelligence
developer-tools
deep-learning
machine-learning
data-science

camelAI
W2024
• Active • 3 employees • San Francisco, CA, USA
For decades, companies have settled for software built for everyone, which means it's perfect for no one. They stitch together five SaaS tools to approximate one workflow, pay for seats they don't use, and file tickets with a data team just to answer a simple question. CamelAI is a different bet: your own AI software engineer, living on its own computer, building exactly what your business needs. You describe what you want. CamelAI builds it, deploys it to a live URL, and keeps it running. No developers required, no infrastructure to manage. Next week when your process changes, you ask again. The software changes with you. This is personal software: tools made for your team, your data, your workflows.
ai
data-visualization
data-science
saas

Cognitio Labs
S2023
• Active • San Francisco, CA, USA
Cognitio Labs is an applied AI research lab building real-time compliance infrastructure for regulated supply chains, serving as the first line of defense against recalls and regulatory failure. When a food contamination event happens, time is everything. Today, traceability relies on spreadsheets, PDFs, and manual logs, turning recalls into multi-day investigations. Entire product categories get destroyed, brands lose trust, and a single recall can cost $10M to $100M+. Why now: The FDA’s FSMA 204 rule requires companies to produce standardized digital traceability records within 24 hours by 2028, impacting over 60,000 U.S. food facilities. The regulatory bar is rising, but the infrastructure to meet it does not exist. Our first product line uses sensors and AI to automatically capture key events such as temperature, handling, and lot-level movements across production, storage, and transit. We convert fragmented operational data into standardized, FSMA 204-compliant traceability records in real time. Compliance is generated as operations happen, not after. This becomes the first line of defense by enabling faster recalls, reducing spoilage, eliminating manual compliance work, and protecting contracts, insurance, and brand equity. We are starting with food and expanding into other regulated, high-risk supply chains.
ai-assistant
aiops
b2b
data-science
artificial-intelligence

Sohar Health
S2023
• Active • 8 employees • New York, NY, USA
Sohar Health is an AI-driven front-end RCM solution that transforms insurance verification processes for healthcare providers, enabling faster patient conversions and reducing administrative workloads. With a 95% automation rate, our API-first platform seamlessly integrates into existing workflows to provide real-time claim accuracy and eligibility checks. Key performance metrics showcase the power of our technology: Median latency of just 6 seconds, ensuring real-time eligibility; over 90% of checks returned within 30 seconds for increased patient conversion; 96% accuracy in identifying and verifying benefits details; Industry-leading 99% accuracy for eligibility determination; >90% carve-out detection rate, mitigating surprise billing risks; a 60% success rate with our Insurance Discovery API, helping identify coverage for self-pay patients. Our customers include outpatient clinics, specialty practices, and digital health platforms looking to streamline front-end claim management. By reducing errors and maximizing clean claim submissions, Sohar Health empowers healthcare organizations to convert and retain more patients, reduce operating costs, and increase their top-line revenue.
artificial-intelligence
digital-health
api
health-tech
data-science

Mito
S2020
• Active • 3 employees • New York, NY, USA
Mito is Cursor for data science. We’re building an AI enabled IDE to 10x the productivity of data people. Data analysts use Mito to automate reports without relying on internal engineering resources. Mito is used by thousands of business analysts, data scientists, and automation engineers at some of the world's largest banks, private equity shops, and consulting firms. Open source is key to our enterprise sales strategy. Mito is open source and built on top of Jupyter. That means getting started with Mito is as simple as running `pip install` and doesn't require enterprises to manager new infrastructure. Check us out at: https://www.trymito.io/
analytics
open-source
developer-tools
data-science
ai

Cellbyte
W2022
• Active • 3 employees • Munich, Germany
Cellbyte's AI agents help pharma companies launch new drugs worldwide. Market-leading firms are using Cellbyte to answer questions like “What price can we achieve for our new drug in the U.S. versus Germany?“. The three co-founders have known and worked with each other for many years: Daniel brings 5+ years of industry experience from his previous job at leading global Life Sciences consultancy Simon-Kucher. Felix has sold $3m+ ACV deals to customers like H&M for his previous YC startup. Samuel holds an MSc in Information Systems from TUM, and has built enterprise AI applications from scratch as ML Engineer at Celonis.
healthcare-it
data-science
ai

Basedash
S2020
• Active • 6 employees • Montreal, QC, Canada
Basedash is the AI-native Business Intelligence platform. Create dashboards and instantly understand your customers using natural language. Connect 500+ data sources, ask a question, and let Basedash visualize the answer.
b2b
saas
data-visualization
data-science
ai

Centaur
W2019
• Active • 45 employees • Boston, MA, USA
The best AI models aren’t just trained and evaluated with human data; they’re built with superhuman data. The strongest datasets emerge through collective intelligence, where humans and machines work together to outperform either one alone. At Centaur, we create superior quality data by turning annotation into an arena where experts and AI compete.
data-labeling
crowdsourcing
data-science
artificial-intelligence

Dost Education
W2017
• Active • 30 employees • Delhi, India
Dost is an ed-tech nonprofit building a platform to expand access to early childhood development in low-resource settings through parent education. Our mission is to unlock children’s full potential by focusing on early learning - the time when 90% of our brains develop. We believe that parents of any literacy level can play a critical role in developing their children’s school and life readiness. In India alone, there are 150 million under-resourced caregivers who can benefit from resources developed for them. That’s why our team at Dost Education - educators, entrepreneurs, and engineers - are passionate about using technology and user-centric product design to change the trajectory of families’ lives. Since 2017, we’ve grown from a small pilot with a few hundred mothers, to reaching over 100,000 families working with state governments. Join us as we continue to innovate, grow our reach, and deepen our impact. We are supported by some of the best funders in tech and social impact space like Y Combinator, Mulago, and many others. Dost Education values diversity. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
education
india
nonprofit
data-science

Nova Credit
S2016
• Active • 100 employees • New York City
Nova Credit is a credit infrastructure and analytics company that enables businesses to grow responsibly through alternative credit data. As a Consumer Reporting Agency (CRA), Nova Credit leverages its unique data infrastructure, compliance framework, and credit expertise to help lenders fill critical gaps in traditional credit analytics. The company transforms the fragmented universe of consumer financial data into compliant, actionable risk insights through a comprehensive platform designed to increase conversion through expanded coverage, speed, and reliability. Leading organizations, including HSBC, RBC, SoFi, Scotiabank, Appfolio, and Yardi, work with Nova Credit to make smarter credit decisions through cash flow underwriting with Cash Atlas™, quickly verify income with Income Navigator, and reach new-to-country consumers with Credit Passport®. Nova Credit is backed by investors including Kleiner Perkins, General Catalyst, Index Ventures, and Canapi as well as executives from Goldman Sachs, JPMorgan, and Citi. Learn more at www.novacredit.com or reach out to connect@novacredit.com.
fintech
data-science

Pachyderm
W2015
• Acquired • 60 employees • San Francisco, CA, USA
Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you.
machine-learning
data-science
developer-tools

Data Science Startups funded by Y Combinator (YC) 2026

Hottest Startup Categories